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SECTION  I 


INTRODUCTION  and  OVERVIEW 

For  many  years  researchers  in  government,  business  and  industry 
have  been  interested  in  developing  means  ot  predicting  the 
future.  Usually  these  predictions  or  estimates  are  highly 
dependent  on  past  experiences,  as  the  past  is  a  reliable  guide  to 
the  future.  Thus  studies  are  made  to  predict  such  things  as  the 
time  it  takes  to  complete  construction  of  a  building  based  upon 
the  size  of  the  labor  force,  the  current  gain  of  a  transistorized 
electrical  component  based,  on  its  sheet  resistance  and  diffusion 
time,  the  demand  for  a  product  (such  as  energy)  based  upon  its 
price,  the  amount  spent  on  advertising  and  expenditures  on 
product  distribution,  the  cost  of  a  piece  of  avionics  equipment 
based  upon  its  physical  and  electrical  characteristics,  and  so 
forth. 

There  have  been  many  approaches  to  prediction  ranging  from  hard 
objective  evidence  to  pure  speculation.  The  approach  taken  in 
this  study  has  been  used  for  over  a  century  and  is  an  area  of 
statistics  called  regression  analysis.  Inherent  in  the 
interpretation  of  the  words  "prediction"  or  "estimate"  is  the 
term  uncertainty.  It  would  be  nice  to  make  "exact"  predictions, 
but  this  is  rarely  the  case  when  dealing  with  a  mass  of 
statistical  data.  Thus,  statisticians  do  not  profess  to  estimate 
exactly,  rather  they  profess  that  their  predictions  are  "on  the 
average"  reasonably  close.  The  basic  concept  of  regression 
analysis  then  is  to  estimate  the  average  value  of  a  given 
variable  (called  the  dependent  variable)  in  terms  of  the  known 
values  of  one  or  more  other  variables  (called  independent 
variables) .  Regression  analysis  expresses  the  relationship  of 
these  variables  by  determining  the  form  of  a  mathematical 
equation  connecting  them. 

The  major  reference  on  the  subject  of  regression  analysis  that  is 
noted  throughout  this  report  is  a  book  written  in  1971  by 
C.  Daniel  and  F.  Wood  entitled  Fitting  Equations  to  Data  (1) 
which  describes  a  most  powerful  computer  program  called  the 
"Linear  Least-Squares  Curve-Fitting  Program"  (LLSCFP).  The 
proposals  presented  in  (1)  have  been  successfully  discussed  in 
seminars  at  many  distinguished  worldwide  universities  as  well  as 
the  Bell  Telephone  Laboratories  and  the  National  Cancer 
Institute.  The  LLSCFP  has  also  been  the  most  sought  after 
program  in  both  the  SHARE  (IBM)  and  VIM  (CDC)  libraries  of 
computer  programs,  and  has  also  been  converted  to  run  in  East 
Germany  and  Russia.  These  techniques  have  been  applied  in  a  wide 
range  of  areas  including  studies  by  government  agencies  of 
variables  for  pollution  control,  searches  for  influential 


(1)  Fitting  Equations  to  Data.  Computer  Analysis  of  Multifactor 
Data  for  Scientist  and  Engineers,  C.  Daniel  and  F.  S.  Wood 
with  the  assistance  of  J.  W.  Gorman,  Wiley,  (1971). 
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variables  which  cause  cancer, studies  to  estimate  hospital  costs, 
studies  in  the  conservation  of  energy  arid  the  evaluation  of  moon 
rocks  at  the  Johnson  Space  Center.  In  addition  a  Bureau  of  Labor 
Statistics  study  has  shown  that  the  coefficients  estimated  by  the 
LLSCFP  are  accurate  to  15  digits.  It  is  felt  then  that  the 
proposals  and  techniques  presented  in  (1)  are  the  "state  of  the 
art"  in  Regression  Analysis. 

The  purpose  of  this  particular  study  is  to  estimate  the 
Operations  and  Maintenance  (OfcM)  cost  of  avionics  equipment  based 
upon  the  equipment's  physical  and  electrical  characteristics, 
avionics  area  and  type  of  aircraft  in  which  the  equipment  is 
used.  The  tool  used  to  estimate  avionics  O&M  cost  is  a  computer 
model  developed  by  Westinghouse  Logistics  Engineering  for  the  Air 
Force  Avionics  Laboratory  (AFAL)  and  is  called  the  Avionics 
Laboratory  Predictive  Operations  and  Support  (ALPOS)  Model. 

ALPOS  basically  consists  of  fifteen  estimating  relationships 
developed  for  the  logistics,  support  and  cost  parameters:  Mean 
Time  Between  Failure  ( MTBF) ;  Mean  Time  Between  Maintenance  Action 
( MTBMA) ;  Total  Maintenance  Manhours  per  Operations  Hour  ( MHTOT) ; 
Unscheduled  Maintenance  Manhours  per  Operating  Hour  ( MHUNS) ;  Shop 
Maintenance  Manhours  per  Operaing  Hour  ( MHSHO) ;  Total  Logistics 
Support  cost  per  Operating  Hour  (LSCTO);  Field  Logistics  Support 
Cost  per  Operating  Hour  (LSCFD);  the  fraction  Not  Repairable  This 
Station  ( NRTS) ;  Depot  Specialized  Repair  Activity  { SRA)  costs; 
and  the  Training  Cost  per  Operating  Hour  (TRAIN) .  The  approach 
to  developing  ALPOS  was  to  collect  data  consisting  of  20 
independent  variables  covering  a  wide  spectrum  of  aircraft  types, 
avionics  areas  and  levels  of  complexity  and  to  develop  cost  and 
parametric  estimating  relationships  via  multiple  regression 
analysis. 

Many  researchers  are  interested  in  determining  some  casual  (i.e., 
cause-and-ef f ect)  relationships  between  the  independent  and 
dependent  variables  with  specific  emphasis  on  the  magnitude  of 
the  regression  coefficients.  Their  main  objective  then  is  to 
determine  the  correct  functional  form  of  the  relationships. 

There  are  examples  in  regression  analysis  that  have  an  assumed 
form  of  the  estimating  relationships  based  on  a  previous  study  or 
on  technical  knowledge  of  the  process  studied.  However,  there 
has  been  no  previous  study  devoted  to  developing  relationhsips 
for  variables  of  avionics  equipment  in  as  many  as  20  independent 
variables,  nor  is  enough  known  about  avionics  equipment  that  will 
lend  to  technical  knowledge  of  the  correct  functional  form.  In 
this  study,  however,  the  main  objective  is  to  develop 
(estimating)  relationships  which  can  be  successfully  used  to 
predict  future  events,  where  all  independent  variables  which  are 
"assumed"  to  have  an  influential  affect  on  the  dependent  variable 
to  be  estimated  are  simultaneously  considered  where  the 
statistics  and  techniques  lead  the  direction  in  obtaining  the 
equations  that  yield  the  "best"  possible  predictions. 
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The  ideal  situation  would  by  that  both  objectives  (correct 
functional  form  and  useful  estimating  relationships)  are 
accomplished  simultaneously.  To  assist  in  obtaining  the  "best" 
possible  equations,  three  forms  or  transformations  of  the 
independent  variables  (namely  the  variable,  its  square  and  its 
natural  logarithm)  and  two  forms  of  the  dependent  variable  (the 
dependent  variable  and  its  natural  logarithm)  are  used  in  this 
report.  Thus,  such  complicated  functional  forms  as: 

V  “  bo  *  blxl  +  b2  *2  2  +  b3  lnx3  +  ... 


and 


y 


t^x  b2x2 
bQe  e 


are  considered  as  means  of  estimating  advanced  avionics  equipment 
costs,  where  y  is  the  dependent  variable,  xx's  the  independent 
variables,  bj/s  are  the  regression  coefficients,  and  e  the 
exponential  function. 

A  prerequisite  for  using  these  prediction  equations  is  that  the 
physical  and  electrical  characteristics  (independent  variables) 
for  the  proposed  piece  of  avionics  equipment  for  which  a 
dependent  variable  estimate  is  desired,  must  be  within  the 
multidimensional  region  covered  by  the  regression  data,  i.e.,  the 
equation  should  be  used  for  interpolation  purposes. 

It  is  to  be  emphasized  that  the  "total  picture"  of  the  results 
that  are  displayed  and  analyzed ,  with  the  intent  of  quantifying  the 
reliability  of  the  relationships  developed,  includes  over  thirty 
statistics,  five  types  of  plots,  and  several  techniques  and 
different  tabular  arrangements  of  the  data  that  are  available  in 
the  computer  printouts  of  the  LLSCFP.  This  document  includes  a 
discussion  of  the  concepts  of  regression  analysis  including  the 
plots  and  techniques  used.  Volume  II  of  the  Phase  I  final  report 
(2)  includes  a  discussion  of  the  statistics  used,  in  addition 
to  a  sketch  of  an  example  demonstrating  the  approaches  and 
procedures  utilized  to  develop  the  parametric  estimating  relation 
snip  for  the  support  parameter  MTBMA,  using  the  Phase  I  -  data. 
This  document  also  includes  a  discussion  of  tne  techniques  used 
to  attempt  the  demonstration  of  the  predictive  validity  of  the 
relationships  developed. 


(2)  Avionics  Laboratory  Predictive  Operations  and  Support  Model, 
Final  Report,  (Phase  I)  Volume  II,  E.  E.  Feltus,  March 
1978. 
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SECTION  II 


-ITT' 


MULTIPLE  REGRESSION  ANALYSIS 


Very  often  in  practice  a  relationship  connecting  two  variables 
(one  independent  and  one  dependent)  is  desired.  The  equation 
most  widely  used  is  the  linear  equation  (in  two  unknowns  6 
and  ^ ) , 


y  -  8  ♦  8,  x.  (1) 

o  11 

If  all  pairs  of  values  of  Xi  and  y,  when  plotted  in  a  scatter 
diagram  on  ordinary  graph  paper,  fall  on  or  near  a  straight  line, 
equation  (1)  is  the  correct  form  of  the  relationship  to  be  used. 
If  in  a  scatter  diagram  plotted  on  semi-log  paper,  the 
observations  fall  near  a  straight  line,  then  the  exponential 
cu  rve , 


is  the  appropriate  choice.  If  a  straight  line  is  obtained  on 
log-log  paper,  the  geometric  curve, 

B1 

y  -  8  x.  1  (3) 

O  l 


is  appropriate. 

In  many  cases,  as  in  estimating  advanced  avionics  equipment 
costs,  one  independent  variable  does  not  provide  enough 
information  to  accurately  predict  the  dependent  variable.  For 
instance,  if  an  estimate  (using  a  regression  equation)  of  the 
Logistics  Support  Cost  Per  Operator  Hour  (LSC/OH)  of  a  piece  of 
avionics  equipment  is  desired,  the  use  of  only  the  weight  of  the 
equipment  (one  independent  variable)  can  lead  to  less  precise 
estimates  and  unstable  regression  equations.  Considering 
additional  independent  variables  can,  in  most  cases,  lead  to  more 
accurate  estimates,  since  more  information  should  lead  to  better 
predictions . 

In  this  study  there  are  20  independent  variables  initially 
considered  which  are  assumed  to  have  a  significant  impact  on  the 
logistics,  support  and  cost  parameters  of  avionics  equipment. 

The  problems  of  fitting  an  equation  to  data  becomes  more 
difficult  to  disentangle  as  the  number  of  independent  variables 
increases.  When  two  or  more  independent  variables  are  considered 
in  a  regression  exercise,  scatter  diagrams  and  other  graphical 
methods  are  often  useless  when  trying  to  determine  the  form  of 
the  assumed  equation.  For  instance,  if  the  two  independent 
variables  x^  and  x:  are  considered,  an  x^  -  y  scatter  diagram 
might  indicate  a  high  linear  relationship  whereas  the  x2  -  y  and 
x,  -  x2  scatter  diagram  may  show  no  apparent  correlation,  even 
tnough  the  true  form  of  the  equation  is 

y  -  8o  4  e1x1  ♦  e2x2  (4) 
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Therefore,  graphical  techniques  are  not  considered  as  an 
alternative  to  finding  the  correct  form  of  equation  to  be  used 
when  multiple  variables  are  considered.  Instead  the  extensive 
use  and  simultaneous  critical  examination  of  many  types  of 
statistics  ("Global"  and  "Local"),  statistical  printouts  and 
tables,  statistical  plots  and  statistical  techniques  are 
performed  in  order  to  find  the  equation  which  best 
approximates  the  data. 

THE  METHOD  OF  LEAST-SQUARES 

The  form  of  the  equations  considered  throughout  this  report  can 
be  written  (or  transformed)  into  the  linear  equation  in  (<  +  1)- 
unknowns 


B  ♦  B,  x, 
o  11 


where  y  is  the  dependent  variable,  xlf...,xK  and  the  r  - 
independent  variables,  e  (the  constant)  and  *  -  coefficient 
e,,...,e<  make  up  the  unknown  («■  +  1)  population  parameters. 

Aiso  it  is  assumed  that  there  are  N  observations  (pieces  of 
equipment)  in  the  sample  indexed  by  j.  Thus  y,  represents  the 
(observed)  jth  observation  of  the  dependent  variable  and  x,  ,  the 
3th  observation  of  the  ith  independent  variable.  Regression 
analysis  requires  that  the  analyst  find  statistics  b0,bi,...b|( 
which  "best"  approximate  the  unknown  («.  +1)  population 
parameters  (where  we  have  taken  a  sample  from  the  population  of 
a 11  avionics  equipment),  and  whose  fitted  equation 


*  blxl 


♦  b.  x 


gives  the  "best"  possible  prediction.  The  method  most  widely 
used  by  statisticians  to  accomplish  this  is  called  the 
method  of  least- squares,  which  says: 

"Find  the  values  of  the  constants  in  the  assumed 
equation  that  minimize  the  sum  of  the  squared 
deviations  of  the  observed  values  from  those  estimated 
by  the  equation."  n 

Q  -  E  (y .  -  Y  . )  2 

In  other  words,  minimize  j-1  -»  J  ,  where  Y  is  the 

estimate  of  the  jth  observation  of  the  dependent  variable 
obtained  by  (6).  Once  the  estimates  bq ,  bi  , . . . ,  b*.  are  found, 
substituting  the  values  of  the  independent  variables  in  (6) 
yields  the  estimate  of  the  dependent  variable  Y.  We  thus  find 
ourselves  in  an  area  of  statistics  called  "Inductive  Statistics" 
which  uses  the  concepts  of  "Statistical  Inference’  to  make 
generalizations  (or  estimates)  of  population  parameters  based 
upon  a  given  sample  of  the  population,  and  to  quatify  the 
reliability  of  the  estimates  obtained.  In  order  to  make  these 
generalizations,  however,  the  data  must  satisfy  certain 
assumptions . 
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ASSUMPTIONS  OF  THE  METHOD  OF  LEAST-SQUARES 

There  are  four  major  assumptions  which  the  data  must  satisfy  in 
order  to  use  the  techniques  of  least-squares  estimation.  They 
are: 

A1 .  The  data  is  "good"  data. 

A2.  The  correct  form  of  the  equation  has  been  chosen. 

e  .  q  .  ,  y  -  6q  ♦  x ^  ♦  •••  +  *«; 

A3.  The  independent  variables  are  constant,  non-random 
variables,  measured  without  error. 

A4.  All  error  is  in  the  observations  of  the  dependent 
variable  y  ,  i.e. 

yj  "  ♦  Vl  *  •••  +  +  ®i 

where  e,  represents  random  error.  Moreover,  the  ej  are 
normally  distributed  independent  random  variables  with 
mean  zero  and  constant,  though  unknown,  variance  o2 (y)  . 

If  all  the  above  four  assumptions  hold  or  "approximately"  hold, 
then  the  least-squares  approach  will  ^ive  the  best  estimates  of 
the  coefficients  in  the  relationships.  Past  experiences, 
however,  indicate  that  slight  departure  from  the  assumptions  of 
normality  and  equal  variances  has  little  effect  on  the  results. 

Since  these  assumptions  are  the  basis  for  the  method  of 
least-squares  estimation  and  hence  regression  analysis,  much 
emphasis  must  be  placed  in  determining  how  close  the  data  fits 
the  assumptions. 

THE  "GLOBAL"  STATISTICS 

As  stated  previously,  in  addition  to  estimating  the  coefficients, 
a  means  is  needed  to  determine  how  "good"  these  estimates  are. 

Table  1  gives  the  notation  and  formulas  for  36  "global" 
statistics  which  must  be  used  and  simultaneously  considered  to 
verify  the  "goodness  of  fit’  of  the  relationships  obtained.  The 
reader  is  referred  to  Volume  II  -  Phase  I  for  an  explanation  of 
what  these  statistics  are  and  how  they  are  used  in  this  study  to  fit 
equations  to  data. 

THE  "LOCAL"  OR  "INTERIOR"  STATISTICS 

All  the  previous  statistics  in  Table  1  fall  under  the  heading  of 
"global"  statistics  in  that  they  are  statistics  of  the  entire  set 
of  data.  The  "global"  statistics  are  helpful  in  determining  how 
the  independent  variables  influence  the  fitted  equations,  but 
they  do  not  describe  how  the  observations  (the  interior  of  the 
data)  in  multifactor  space  affect  the  fit.  The  four  innovative 
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TABLE  1 


Notation  And  Formulas  For  The  "GLOBAL"  Statistics 


1.  Number  of  obeervations . 

2.  Number  of  independent  variables. 

3.  Index  for  observations. 

4.  Index  for  independent  variables. 

5.  Independent  variables. 

6.  Dependent  variables. 

7.  unknown  parameters  (coefficients) . 

8.  Measurement  error. 

9.  Error  variance. 

10.  Statistical  Models  error  normal, 
independent,  constant  variance. 


N 

K 

J*1 ,  .  . .  ,N 
i-1 . K 


ej 

•l 

a2(  y) 

VViV— 


11.  Sums  of  variables  (independent  and 
dependent) . 

12.  Means  of  variables. 

13.  Root  mean  squares  of  variables. 
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15.  Minimum  value  of  variables. 

16.  Range  of  variables. 
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TABLE  1  (Continued) 


"interior"  statistics  in  Table  2  (see  11])  can  assist  in  such 
things  as  detecting  outliers;  indicating  observations  which  may 
influence  the  rorm  of  the  equation  (possibly  introducing 
curvature);  detecting  those  observations  which  have  the  largest 
effect  on  the  assumed  equation,  finding  those  observations  which 
are  taken  approximately  under  the  same  xi-conditions  (called 
nearby  neighbors)  and  in  testing  the  validity  of  the  "global" 
statistics.  These  nearby  neighbors  are  used  to  obtain  a  less 
biased  estimate  of  our  error  of  prediction,  o2(y).  The  interior 
statistics  defined  are  weighted  by  the  bj  -  values  so  as  to 
reduce  the  effects  of  uninf luential  factors. 


STATISTICAL  PRINTOUTS  AND  TABLES 

The  statistics  ("global"  and  "interior")  constitute  a  major 

portion  of  the  results  used  to  evaluate  a  multiple  regression 

equation,  but  by  no  means  do  the  statistics  give  the  "total 

picture".  There  are  statistical  printouts  and  tables, 

statistical  plots  and  statistical  techniques,  which  must  also  be 

considered  with  the  statistics  to  get  a  view  of  the  "total 

picture"  and  to  adequately  evaluate  the  results.  A  printout  of 

the  statistics  generated  by  the  LLSCFP  for  the  estimating 

relationship  developed  for  the  support  parameter  Mean  Time 

Between  Maintenance  Actions  ( MTBMA) ,  where  In  (MTEMA)  is  the  dependent 

variable,  is  shown  on  pages  is  throuan  73, Appendix  B. 

Appendix  B  contains  the  results  of  the  estimating  relationships 
developed  for  each  variable  considered  in  this  study  and  will  be 
further  elaborated  on  in  section  IV.  Page  65  displays  many  of 
the  preliminary  statistics  including  the  sums,  residual  sum  of 
squares  and  cross  products,  means  and  root  mean  squares  of  the 
variables  (independent  and  dependent),  the  simple  correlation 
coefficients  and  the  elements  of  the  inverse  matrix.  Page  65 
shows  many  other  "global"  statistics  including  the  coefficients 
(bA),  the  standard  error  of  the  coefficients  (s(bi)),  the 
t-values  ( t i ),  R^2,  the  minimum,  maximum,  range  and  relative 
influence  of  each  independent  variable  x, .  In  addition  the 
number  of  observations  (N) ,  the  number  of  independent  variables 
( k ) ,  the  residual  degrees  of  freedom  (N-*-l) ,  the  F-value, 
residual  mean  square  (RMS)  and  the  multiple  correlation 
coefficient  squared  (Ry2)  are  displayed. 


TABLE  2 


Notation  And  Formulas  For  The  "Interior"  Statistics 


Weighted  squared  standardized 
distance . 

Component  effects 

WSSD 


WSS  DISTANCE 


C  «b  (x  — x  ) 

ij  i  ij  V 


WSSD 


Jj 


1^ 

-2 


( 


'  S(y)i-1 


Cumulative  standard  deviation 
estimated  from  near  neighbors. 
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The  "interior"  statistic,  component  effect,  Ci;j,  of  each  variable 
on  each  observation  is  printed  in  tabular  form  on  pages  66  to  79  where 
the  variables  are  ordered  by  their  decreasing  relative  influences 
in  columns,  and  the  observations  are  ordered  by  their  decreasing 
effects  on  the  most  influential  variables  in  rows.  Here  the 
analyst  can  see  which  particular  observations  are  most 
influential  in  their  effects  on  the  fitted  equation.  In  addition 
this  table  can  be  used  to  determine  the  importance  of  high 
correlation  among  the  independent  variables. 

Standard  Deviation  Estimated  From  Residuals  of  Neighboring 

Observations 

The  cumulative  estimates,  Sn,  of  the  standard  deviation  are 
printed  in  the  second  column  of  pag<£  70  and  71.  Tne  WSSDjj, 
observations  in  columns  4  and  5  are  printed  in  the  tnird  column. 

Also  the  observations  are  ordered  by  their  increasing  fitted  y 
values.  At  the  top  of  page  70  is  tne  residual  root  mean  square 
=  .48.  The  cumulative  standard  deviation  column  indicates  that 
the  standard  deviation  estimated  from  near  neighbors  is 
approximately  .49,  which  indicates  that  there  is  no  evidence  of 
lack  of  fit. 

Observations  Ordered  bv  Computer  Input  and  bv  Residuals 

As  shown  on  page  72  under  the  heading  "ORDERED  BY  COMPUTER 
INPUT,"  the  residuals  are  listed  in  the  order  in  which  the 
observations  were  given  to  the  computer.  The  Work  Unit  Code 
(WUC)  of  each  piece  of  equipment  is  given  in  the  first  column  for 
identification  purposes.  The  third  column  shows  the  WSS  DISTANCE 
for  each  observation.  Here  those  observations  far  from  the 
centroid  of  all  observations  can  be  easily  spotted  (observation 
18  and  77).  Under  the  heading  "ORDERED  BY  RESIDUALS,"  the 
residuals  are  listed  in  the  order  of  the  magnitude  of  the 
residuals.  This  gives  an  indication  of  which  observations  are 
fitted  the  best  (or  worst).  As  can  be  seen,  observation  46  is 
fitted  best  and  50  is  fitted  worst. 


STATISTICAL  PLOTS 

As  with  any  endeavor  dealing  with  Deductive  Reasoning,  the 
conclusions  are  dependent  on  the  validity  of  the  assumptions. 

Thus  the  analyst  must  have  some  means  of  verifying  the  degree  to 
which  the  assumptions  are  satisfied.  In  addition  to  the  number 
of  statistics  and  statistical  tables,  there  are  five  types  of 
computerized  plots  that  can  be  used  to  determine  how  close  the 
data  and  fitted  equations  satisfy  the  assumptions  of  the  method 
of  least-squares.  These  plots  give  the  analyst  much  insight  into 
the  fit  that  the  statistics  alone  cannot.  The  plots  are  used  to 
determine  (1)  whether  the  assumptions  of  the  method  of 
least-squares  are  "nearly"  satisfied,  (2)  just  how  well  (or  bad) 
the  equation  fits  the  data,  and  (3)  to  obtain  further  insights 
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properties  affect  the  fit. 

Cumulative  Distribution  of  Residuals 

When  k  independent  variables  are  fitted  to  data  with  normally 
distributed  error  (Assumption  A4  ),  it  can  be  shown  that  the 
residuals  also  have  a  normal  distribution.  Therefore,  the  graph 
of  the  residual  versus  cumulative  frequency  should  be  "nearly"  a 
straight  line.  Page  74  (Appendix  B)  is  the  cumulative  frequency 
plot  for  the  fitted  equation  developed,  with  In  ( MTBMA)  as  the 
dependent  variable.  There  is  no  indication  here  of  deficiencies 
in  the  fit. 

Residuals,  vs  filled  I 

The  plot  of  the  residuals  versus  the  fitted  values  of  the 
dependent  variable  is  also  helpful  in  checking  Assumption  A1 ,  A2 
and  A4 .  This  plot  may  show  whether  there  is  some  dependence  of 
the  magnitude  of  the  residuals  on  the  magnitude  of  the  fitted 
values.  Daniel  and  Wood  [1],  gives  four  common  defects  that  may 
be  revealed  by  such  plots.  Recall  Assumption  A3  states  that  the 
variance  of  the  error  is  constant.  The  plot  of  residual  versus 
fitted  Y  should  then  show  an  equal  scatter  about  the  O-residual 
line.  The  two  plots,  cumulative  frequency  plot  and  the  plot  of 
the  residulas  vs.  fitted  Y,  can  be  used  together  to  determine 
whether  an  observation  is  an  outlier  (impossible  value). 

However,  if  the  observation  is  at  the  extreme  ends  of  the  ranges 
of  the  dependent  variable,  curvature  may  be  needed  in  the 
relationship,  on  paqe  75  there  is  a  plot  of  the  residual  versus 
fitted  Y  for  the  equation  developed  for  MTBMA.  The  equal  scatter 
of  the  residuals  about  the  O-residual  line  does  not  indicate 
deficienies  in  the  equation  developed. 

Residuals  vs  Independent  Variable  X(I) 

The  pattern  of  the  residuals  in  the  plot  residuals  versus 
independent  variable  Xi  is  useful  in  determining  whether  other 
functional  forms  of  the  independent  variables  are  needed.  The 
residuals  should  be  equally  scattered  about  the  O-residual  line. 

As  an  actual  example,  Figure  1  is  a  plot  of  the  residuals  versus 
an  independent  variable  x^  where  obviously  a  squared  term  is 
needed  in  the  equation.  This  plot  was  obtained  when  a  fit 

y  *  bD  +  biXi  +  b2x2 

was  made  to  a  set  of  data  when  the  true  form  of  the  equation  was 

y  *  bQ  +  bixi  ♦  b2x2  +  bjxj2 

For  this  fit  however  the  "global"  statistics  were  significant  and 
did  not  indicate  anything  wrong  with  the  fitted  equation.  In 
particular  Ry  «  .9047  and  the  F-VALUE  «  228.  Figure  2  is  another 
example  plot  where  the  equation 
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was  fitted  to  data  and  the  true  form  was 


y  *  bQ  +  +  b2x22  +  b3xl4 

Here,  Ry2*=  .9963  and  the  F-VALUE  ■  3267.  These  two  simply 
examples  indicate  why  the  practice  of  considering  only  R  and  the 
F-VALUE  as  measures  of  "goodness  of  fit"  is  not  statistically 
sound . 

Component  and  Component-Plus-Residuals  vs  Independent 

Variable  &.LL) 

The  component  versus  independent  variable  xx  is  a  plot  of  the 
component  effect  of  each  observation  on  each  variable  versus 

the  independent  variable.  The  component-plus-residuals  is  the 
sum  of  the  component  effects  of  each  observation  and  its 
residual.  As  stated  in  111, 

"Component-plus  residual  plots  are  used  as  an  aid 

(1)  to  choose  the  appropriate  form  of  the  equation, 

(2)  to  observe  the  distribution  of  the  observations 
over  the  range  of  each  independent  variable  and 

(3)  to  estimate  the  influence  of  each  observation 
on  each  component  of  the  equation." 

Observations  at  the  extreme  ends  of  the  ranges  of  the  independent 
variables  usually  control  the  estimates  of  the  statistics.  The 
component-plus-residuals  plots  can  be  used  (with  indicator 
variables  and  the  Cp-search  technique)  to  determine  if  these 
extreme  points  are  compatible  with  the  remainder  of  the  data.  If 
it  is  determined  that  these  extreme  values  are  not  compatible 
with  the  rest  of  the  data,  then  either  curvature  should  be 
introduced  in  that  independent  variable  or  other  subjective 
information  (introduced  by  indicator  variables)  about  the  points 
in  question  should  be  considered.  Pages  7f  through  121  show 
the  residuals  and  component-plus-residuals  versus  each  of  the  23 
independent  variables  in  the  MTEMA  fit. 

CP  vs  P 

The  Cp-plot  (developed  by  Mallows  |3]),  is  a  plot  of  the 
Cp-statistic  for  an  equation  versus  P  where  P  ■  k  +1.  For  those 
equations  with  negligible  bias,  the  Cp-statistic  will  fall  near 
line  Cp  ■  p.  Obviously  the  analyst  would  like  to  choose  the 
equations  with  smallest  total  squared  errors  Cp  and  with  the 
least  amount  of  bias.  Page  123  is  the  Cp-  plot  for  MTBMA. 

STATISTICAL  TECHNIQUES 

There  are  two  techniques  utilized  that  are  helpful  in  finding  the 
subset  collection  of  variables  which  best  fits  the  data  and  in 
determining  the  stability  of  the  equations  obtained. 
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CP-Seaich  Technique 

Two  approaches  that  have  been  widely  used  to  search  the  2  * 
possible  equations  for  the  "best"  combination  of  variables  are 
called  "Stepwise  Regression  (forward  and  backward)"  and  the 
"F-test."  Forward  stepwise  regression  introduces  the  independent 
variables,  one  at  a  time,  into  the  equation  and  uses  a  criterion 
involving  the  t^  values  to  determine  whether  or  not  xx  should  be 
left  in  the  equation.  Backward  stepwise  regression  begins  with 
the  complete  initial  set  of  variables  and  drops  the  variables  by 
using  a  similar  criterion  as  that  of  forward  stepwise  regression. 
The  F-test  is  widely  used  to  determine  the  significance  of  adding 
an  independent  variable.  Obviously,  these  two  technitjues  do  not 
search  all  the  2“-  possible  equations,  but  only  a  portion  of  them. 
Moreover,  a  search  with  these  techniques  can  lead  to  different 
results  when  the  independent  variables  are  correlated  or  if  the 
variables  are  introduced  in  different  orders. 

With  the  equation  form  assumed,  there  is  usually  some  smaller 
subset  of  these  variables  which  have  "very"  influential  effects 
on  the  dependent.  This  subset  can  be  called  the  Basic  Set  of 
Variables.  A  search  proposed  by  Daniel  and  Wood  (1)  called  the 
t<fl  -  directed  search  is  used  to  determine  a  Basic  Set.  With 
the  Basic  Set  always  included,  a  technique  called  the  Cp-search 
technique  is  used  to  search  up  to  21"  =  262,144  equations  for  the 
best  combination  of  variables  which  gives  the  smallest 
Cp-statistics ,  and  hence  smallest  total  squared  error.  In  some 
cases  (e.g.,  when  the  wrong  form  of  the  equation  is  used)  there 
is  no  Basic  Set  of  Variables,  and  here  the  analyst  has  the  option 
of  choosing  a  basic  set  (usually  those  variables  with  the  largest 
ti  -  values)  until  there  are  at  most  18  variables  remaining  to  be 
searched  by  the  Cp-search  technique.  This  method  is  called 
Fractional  Replication  (see  III).  The  Cp-search  technique  using 
both  types  of  searches  have  consistently  helped  to  narrow  down 
the  initial  set. 

Cross  Verification  of  Coefficients  with  a  Second  Sample  of  Data 

Once  a  presumably  final  equation  is  obtained,  the  analyst  must 
determine  the  stability  of  the  obtained  equations  coefficients. 
There  may  be  a  few  observations  in  the  data  base  (such  as  those 
with  large  WSS  DISTANCE  and  large  residuals)  that  are  not 
compatible  with  the  rest  of  the  data  and  may  be  controlling  the 
estimates  of  the  fitted  coefficients.  A  way  to  determine 
stability  is  to  drop  those  observations,  run  another  regression 
and  determine  the  effects  on  the  least-squares  estimates  of  the 
coefficients.  This  technique  is  called  cross  verification  of 
coefficients  with  a  second  sample  of  data  and  provides  a  rigorous 
test  of  the  data,  the  model  and  the  fitted  coefficients.  As 
shown  in  Section  V,  cross  verification  of  coefficients  with  a 
second  sample  of  data  can  also  be  used,  under  limited 
circumstances,  to  attempt  demonstration  of  the  predictive 
validity  of  the  relationships  developed.  Component-plus-residual 
plots  of  the  second  sample  of  data  (where  residuals  are 
calculated  using  the  initial  coefficients)  may  point  out  those 
observations  which  may  indicate  that  other  forms  of  curvature  are 
needed . 
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SECTION  III 


THE  "DESIGN  OF  THE  EXPERIMENT" 

The  approach  to  the  development  of  the  relationships  used  in 
ALPOS  was  to  identify  candidate  observations  called  Line 
Replaceable  Units  (LRUs) ,  to  collect  data  on  physical,  logistics 
and  cost  parameters  (variables)  of  these  LRUs,  to  fit  equations 
to  the  data  via  the  techniques  of  multiple  regression  analyses, 
and  to  use  the  results  of  the  regressions  to  contruct  the  model. 
The  data  (a  year's  worth  of  data)  is  displayed  in  Appendix  A  and 
was  extracted  from  a  number  of  sources  including  existing  Air 
Force  Data  Systems  ( AFM  66-1,  Increased  Reliability  of 
Operational  Systems  (IROS)  and  Visibility  and  Management  of 
Operations  and  Support  Costs  (VAMOSC)),  site  visits  and  contacts 
with  various  Air  Force  agencies,  and  Westinghouse  activities, 
published  reports  and  engineering  analyses.  There  are  128  pieces 
of  avionics  equipment  (observation)  on  which  the  study  is  based 
where  each  observation  can  be  identified  by  its  observation 
number  and  Work  Unit  Code  (WUC) .  Also  associated  with  each 
observation  is  a  total  of  29  variables,  of  which  19  are 
independent  variables  and  10  are  dependent  variables.  It  should 
be  noted  that  the  variable  NRTS  is  used  as  a  dependent  variable 
in  one  fit  and  is  also  used  as  an  independent  variable  in  those 
fits  in  which  NRTS  is  expected  to  have  a  significant  impact  (eg. 
the  logistics  support  cost  in  the  field). 

The  independent  variables  are  of  two  types,  quantitative  and 
qualitative.  The  usual  types  of  variables  in  a  regression 
exercise  are  quantitative  (i.e.,  variables  that  may  take  on 
values  over  a  given  range)  such  as  weight  or  other  physical 
characteristics  of  the  equipment.  Many  times  additional 
(qualitative)  information  is  available,  such  as  certain 
characteristics  of  the  equipment  or  a  certain  class  in  which  the 
equipment  belongs,  which  should  not  be  discarded,  but  should  be 
introduced  into  the  regression.  "Indicator"  variables  (variables 
which  take  on  the  value  of  0  or  1)  are  used  to  introduce 
qualitative  information  into  the  regressions.  A  "1"  indicates 
that  the  observation  is  in  a  certain  class  and  a  "0"  indicates 
that  it  is  not. 

The  type  of  aircraft  in  which  a  piece  of  equipment  is  used  and 
the  equipment  avionics  areas  are  the  two  qualitative  classes  used 
in  this  study.  There  are  three  types  of  aircraft:  Fighters, 
Bombers  and  Cargo,  and  three  areas  of  avionics:  Navigation, 
Sensory,  and  Communications.  Table  3  shows  the  observation 
numbers  of  each  piece  of  equipment  and  the  class  to  which  it 
belongs.  For  instance,  observation  69  is  a  piece  of 
communications  equipment  that  is  used  in  a  fighter.  Since  there 
is  no  sensory  equipment  used  in  cargo  type  aircraft,  no 
observations  are  present  there.  The  numbers  in  parenthesis 
indicates  the  quantity  of  observations  in  each  category.  Thus  29 
LRUs  are  used  in  bombers  and  35  LRUs  are  sensory  type  equipment. 
The  numbers  in  the  corners  of  the  inner  rectangles  indicate  the 
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number  of  observations  which  fall  in  the  respective  interactive 
classes.  There  are  34  observations  which  are  used  in  fighters  in 
addition  to  being  navigations  equipment. 

Table  4  lists  the  names  of  all  the  variables  and  their  associated 
variable  names  used  in  the  computer  printouts  of  the  regressions. 
Also  listed  are  the  units  in  which  the  variables  are  expressed. 
FIG,  BOM  and  CAR  are  indicator  variables  used  to  represent  the 
three  types  of  aircraft,  fighters,  bombers  and  cargo, 
respectively.  NAV,  SEN  and  COM  are  used  to  represent  the  three 
areas  of  avionics  ;  navigation,  sensory  and  communications, 
respectively . 


TABLE  4 

VARIABLES  USED  IN  THE  REGRESSIONS 
INDEPENDENT  VARIABLES 


INDICATOR  QUANTITATIVE 


FIGSEN 

S 

FIG 

* 

SEN 

UP 

a 

Unit  price 

FIGCOM 

s 

FIG 

* 

COM 

VO 

a 

Volume 

BOMNAV 

a 

BOM 

* 

NAV 

vrr 

a 

Weight 

BOMSEN 

a 

BOM 

* 

SEN 

cc 

a 

Components  count 

BOMCOM 

a 

BOM 

« 

COM 

CD 

a 

Components  density 

CARNAV 

a 

CAR 

* 

NAV 

DI 

a 

*  Digital 

CARCOM 

a 

CAR 

* 

COM 

EM 

a 

%  Electro-mechanical 

PS 

a 

%  Power  supply 

XT 

a 

%  Transmitter 

SS 

a 

%  Solid  state 

PD 

a 

Power  dissipation 

UF 

a 

Utilization  factor 

DEPENDENT  VARIABLES 


MTBF  ■  Mean  (operating) Time  Between  Failure 

MTBMA  »  Mean  (operating)  Time  Between  Maintenance  Actions 

MHTOT  =  Maintenance  Manhours  (total)  per  Operating  Hour 

MHUNS  ■  Maintenance  Manhours  (unscheduled)  per  Operating  Hour 

MHSHO  ■  Maintenance  Manhours  (shop)  per  Operating  Hour 

LSCTO  *  Logistics  Support  Costs  (total)  per  Operating  Hour 

LSCFD  *=  Logistics  Support  Costs  (field)  per  Operating  Hour 

NRTS  -  Fraction  Not  Repairable  This  Station 

SRA  -  Depot  Special  Repair  Activity  Cost  per  unit 

TRAIN  -  Training  Cost  per  Operating  Hour 


Since  interactions  proved  to  be  significant  in  all  relationships 
in  Phase  I,  the  interactive  classes  were  used  as  the  indicator 
variables  for  Phase  II  regressions  where  FIGNAV  (not  in  the 
regression)  is  the  interactive  baseline. 

There  are  three  lines  of  information  associated  with  each 
observation  in  Appendix  A.  The  first  line  lists  the  7  indicator 
variables,  the  second  the  (quantitative)  independent 
variables,  and  the  third  lines  list  the  10  dependent 
variables.  For  instance,  observation  5  with  WUC  71PBO  is  a  piece 
of  navigation  equipment  that  is  used  in  a  fighter  with  an 
adjusted  unit  price  of  $2241,  weight  -  36.5  lbs.,  volume  ■ 

1276  in3,  power  dissipation  ■  289  watts,  MTBMA  «  476  hours,  and 
SRA  -  $470. 

Initially  128  LRUs  were  considered  for  the  study.  Many  of  the 
observations  were  dropped  from  the  analysis  because  of  the  lack 
of  data  or  the  difficulty  in  obtaining  the  necessary  data.  Other 
observations,  such  as  equipment  which  had  not  been  in  the  Air 
Force  inventory  long  enough  to  experience  "good*  data,  were 
discarded  so  as  not  to  introduce  bias  in  the  results.  In 
addition  a  few  of  the  observations  had  missing  dependent  variable 
data  and  were  omitted  for  that  particular  fit.  The  observation 
numbers  and  WUCs  of  each  piece  of  equipment  used  in  each  fit  can 
be  obtained  from  the  tables  "ORDFRED  BY  COMPUTER  INPUT"  in 
Appendix  B. 

Many  times  the  statistics,  plots,  tables  and  techniques  indicate 
that  some  observations  do  not  behave  like  the  remainder  of  the 
data.  Besides  other  possible  subjective  variables,  curvature  ma- 
be  causing  this  instability.  In  addition  to  the  variables  shown 
in  Table  4,  two  transformations  of  the  independent  variable  (the 
square  and  natural  logarithm)  and  a  transformation  of  the 
dependent  variable  (natural  logarithm)  are  introduced  into  the 
regressions  when  curvature  is  indicated.  The  natural  logarithm 
transformation  is  considered  for  those  variables  whose  range  is 
contained  in  the  positive  real  numbers.  Using  UP  ■  unit  price  as 
an  example,  the  variable  names  of  the  transformed  independent 
variables  as  listed  in  the  computer  printouts  of  the  LLSCFP  are 
of  the  form: 


MU P  -  (UP  -  UP) 

DUP  -  (UP  -  dUP)2 
LUP  -  In  (UP) 

where  UP,  dUP  and  In  (UP)  indicate  the  mean,  d-statistic,  and 
natural  logarithm  of  the  variable  unit  price  (UP),  respectively. 
Table  5  shows  the  d-statistics  and  means  of  each  quantitative 
independent  variable  used  in  each  of  the  regressions. 

Before  beginning  any  regressions,  the  data  must  be  critically 
analyzed  for  outliers  (impossible  values)  and  for  what  is  known 
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as  "Nested  Data".  The  data  is  said  to  be  "Nested"  if  some  of  the 
observations  have  all  or  nearly  all  the  same  or  approximate 
x  -values.  Obviously  outliers  would  have  a  significant  impact  on 
the  fitted  coefficients,  thereby  yielding  the  incorrect 
relationships.  If  the  equations  are  fitted  without  checking  to 
determine  whether  or  not  the  data  are  nested,  the  wrong  factors 
may  be  significant.  The  analysis  of  "Nested  Data"  was  first 
introduced  into  the  statistical  literature  by  Daniel  and  Wood 
( l 1] ,  Chapter  8) . 

As  a  means  of  determining  whether  the  data  are  "nested",  a 
computer  program  was  written  to  sort  the  data  table  by  each 
independent  variable,  xi.  For  instance.  Table  6  is  a  printout  of 
the  data  ranked  by  the  weight.  As  can  be  seen  there  are  only  a 
few  observations  with  the  same  x^  values,  and  hence  no  evidence 
of  serious  nesting  exists.  Also  the  components  -  plus-residual 
plots  can  be  used  to  detect  nested  data.  If  it  is  determined 
that  the  data  are  nested,  two  fittings  must  be  made:  one  on  the 
nested  data  "within  plots"  and  one  "among  plots". 

The  reader  is  referred  to  Volume  I  of  this  report  to  see  other 
factors  which  define  the  observations,  the  definitions  of  each 
variable  used  {independent  and  dependent)  and  which  data  systems 
were  used  to  obtain  the  information  (raw  data)  for  each  variable. 


SECTION  IV 


RESULTS  OF  THE  REGRESSION  ANALYSES 

After  the  data  was  critically  examined  and  alternate  LRUs 
considered  where  necessary,  the  multiple  regression  analyses  for 
each  dependent  variable  (with  variations)  in  Table  4  were 
performed  with  the  results  displayed  in  Appendix  B.  Table  7 
gives  a  summary  of  these  results  including  the  run  numbers  (for 
identification  purposes),  the  number  of  independent  variables, 
the  number  of  variables  in  the  Basic  Set,  the  multiple 
correlation  coefficient  squared,  the  F-value,  the  residual  root 
mean  square  (standard  deviation),  the  standard  deviation 
estimated  from  residuals  of  neighboring  observations,  the 
observations  with  large  WSSD,  the  normal,  fitted  Y  and 
component-plus-residuals  plots.  The  four  parameters  MTBF  (Run 
1),  MTBMA  (Run  2),  MHTOT  (Run  3)  and  LSCTO  (Run  6)  are  major 
drivers  of  Operations  and  Maintenance  (0  &  M)  costs  and  were 
therefore  given  more  attention.  We  note  that  the  natural 
logarithm  transformation  after  much  analysis  was  used  for  each 
dependent  variable  except  NRTS .  Each  equation  was  approached  as 
the  statistics,  tables,  computerized  plots  and  techniques 
directed  with  much  emphasis  on  the  "total  picture"  of  the 
results.  This  included:  (1)  the  use  of  the  Normal  and  Fitted 
values  plots  to  detect  outliers,  (2)  the  use  of  the 
component-plus-residuals  plots  to  detect  those  observations  which 
extend  the  ranges  of  the  independent  variables  by  a  "significant" 
amount  and  then  using  indicator  variables  in  conjunction  with  the 
Cp-search  technique  to  determine  the  effects  of  such  extended 
observations  to  determine  whether  the  observations  behave  like 
the  rest  of  the  data  (thereby  simply  extending  the  ranges  of  the 
variables)  or  whether  they  are  not  consistent  with  the  remainder 
of  the  data  (possibly  indicating  curvature  in  the  form  of  squares 
and  natural  logarithms),  (3)  the  use  of  the  technique  of  cross 
verification  of  coefficients  with  a  second  sample  of  data  to 
determine  if  those  observations  with  large  (absolute)  residuals 
or  large  WSS  distance  are  in  fact  controlling  the  estimates  of 
the  coefficients  (possibly  indicating  curvature)  and  to  determine 
the  stability  of  the  relationships  developed  and  (4)  the  use  of 
the  Cp-search  technique  to  find  those  subset  collections  of 
independent  variables  which  approximate  the  data  with  smallest 
total  squared  error  Volume  II  of  the  Phase  I  final  report  (2) 
gives  a  sketch  of  an  example  indicating  how  the  approaches  and 
techniques  were  utilized  to  fit  the  parameter  MTBMA  to  the  Phase 
I  data. 


(2)  Op.  Cit. 
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As  shown  in  Table  7  all  equations  (except  NRTS)  show  significant 
results  with  no  indication  of  serious  lack  of  fit.  The  multiple 
correlation  coefficients  squared  range  from  .85  (Run  10)  to  .93 
(Run  3  and  Run  11)  with  F-values  significant  at  a  .01  level  of 
significance.  The  fesidual  root  mean  square  ( RRMS)  and  the 
cumulative  estimates  of  the  standard  deviation  (for  instance,  .48 
versus  .49  for  Run  9)  indicate  that  there  is  no  evidence  of  lack 
of  fit.  A  cross  verification  of  coefficients  was  performed  on 
those  relationships  which  had  observations  with  large  WSSD  (Run 
1,  Run  2,  Run  8)  where  there  w»s  no  "significant"  change  in  the 
values  of  the  coefficients.  The  observations  with  high  and  low 
values  in  the  Normal  and  Fitted  values  plots  ate  not  extreme  as 
in  the  case  of  outliers  but  have  RRMS  slightly  larger  than  the 
remainder  of  the  observations  in  the  data  base  (see  Appendix  B) . 
For  instance,  the  MHTOT  equation  (Run  3)  although  significant, 
had  one  observation,  11,  with  the  largest  (negative)  residual  in 
each  of  the  intermediate  fits,  regardless  of  the  functional  form 
considered.  Observation  11  with  WUC  71CAO  is  a 
receiver-transmitter  (navigation  equipment)  used  in  the  F15A. 
Investigation  into  the  data  elements  for  this  LRU  did  not 
indicate  errors  in  the  collected  data.  Consequently,  observation 
11  remained  in  the  data  base,  3ince  there  is  not  enough  known 
about  avionics  equipment  and  the  form  that  the  MHTOT  equation 
should  be  to  deem  it  an  outlier.  A  cross  verification  of 
coefficients  was  performed  for  those  observations  with  large 
residuals  with  slight  changes  in  the  values  of  the  coefficients. 
Thus,  the  stability  of  the  relationships  was  attained. 

It  was  anticipated  that  NRTS  could  not  be  estimated  well  with  the 
independent  variables  considered  in  this  study  since  NRTS  is 
highly  dependent  on  many  other  factors  (many  subjective)  which 
are  not  considered.  Leaving  out  influential  variables  can  make 
other  collections  of  less  influential  variables  appear 
significant  when  in  fact  they  are  not.  Although  there  is  serious 
lack  of  fit  in  the  NRTS  relationship,  10.07  versus  7.52,  the 
results  are  still  useful,  since  only  large  differences  in  NRTS 
cause  significant  changes  in  the  total  number  of  spares  estimated 
by  the  EBO  routine  (Volume  I).  The  ALPOS  model  provides  the 
option  for  the  user  to  input  his  own  value  for  NRTS  or  to  use  the 
relationship  developed  for  NRTS  when  no  estimates  of  NRTS  are 
available.  There  are  also  three  relationships  (Run  11,  Run  12, 
Run  13)  which  use  NRTS  as  an  independent  variable.  The  user  can 
input  different  values  of  NRTS  over  a  specific  range  and 
sensitize  NRTS  to  determine  the  impact  that  changes  in  NRTS  have 
on  the  model  outputs. 

All  of  these  options  for  NRTS  aid  in  determining  the  impact  that 
different  maintenance  philosophies  have  on  avionics  equipment 
costs.  The  stability  of  the  NRTS  relationship  was  attained  and 
validation  results  (Section  V)  indicate  to  the  author  that  with 
the  data,  indicator  variables,  independent  variable,  and 
transformations,  considered  in  the  study,  the  "best"  relationship 
for  NRTS  was  obtained. 
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Table  8  gives  a  summary  of  the  independent  variables  (including 
transformations)  selected  via  statistical  analyses  for  each 
relationship  developed.  Table  9  summarizes  Table  8  indicating 
those  independent  variables  considered  of  which  the  dependent 
variable  is  a  function.  Many  of  the  relationships  are  related 
and  can  be  compared.  For  instance,  the  reciprocal  of  the  MTBF 
(Run  1)  is  the  inherent  reliability  failure  rate  of  the 
equipment.  The  reciprocal  of  the  MTBMA  (Run  2)  is  the  total 
equipment  reliability  failure  rate  of  which  the  MTBF  is  a 
significant  portion,  but  it  also  includes  other  factors  such  as 
manufacturing  defects,  wearout  rate,  dependent  failure  rate, 
operator-induced  failure  rate,  maintenance  induced  failure  rate 
and  equipment  damage  rate  (3).  Both  MTBF  and  MTBMA  are  dependent 
on  23  independent  variables  (Table  8)  where  in  particular  they 
are  a  function  of  the  same  23  independent  variables  (Table  9) . 
Table  9  shows  that  both  MTBF  and  MTBMA  are  functions  of  all  20 
independent  variables  considered  in  this  study  except  FIGSEN, 
BOMSEN  and  the  components  count  (CC) . 

The  total  maintenance  manhours  per  operating  hour  (Run  3)  is  the 
sum  of  the  scheduled  manhours  per  operating  hour  (not  fitted), 
the  unscheduled  manhours  per  operating  hour  (Run  4)  and  the 
maintenance  manhours  per  operating  hour  in  the  shop  (Run  5). 
Tables  8  and  9  show  that  the  MHTOT,  MHUNS,  and  MHSHO  are 
dependent  on  many  of  the  same  independent  variables.  The  total 
logistics  support  cost  per  operating  hour  (Run  6)  is  the  sum  of 
logistics  support  cost  per  operating  hour  in  the  'field  (Run  7), 
the  special  repair  facility,  packing  and  shipping  and 
condemnation  costs  (not  fitted).  Taole  9  shows  that  the  LSCTO 
and  LSCFD  and  functions  of  the  same  independent  variables  where 
the  only  difference  as  shown  in  Table  8  is  that  the  LSCFD 
relationship  has  WT  as  an  independent  variable  whereas  the  LSCTO 
relationship  uses  MWT  and  DWT  in  its  place. 

As  stated  previously,  three  relationships  MHTOT  (Run  11),  MHSHO 
(Run  12)  and  LSCFD  (Run  13)  which  "should"  be  functions  of  NRTS 
were  fitted  to  the  data  with  NRTS  included  as  an  independent 
variable  and  can  be  compared  with  their  counterparts  Run  3,  Run  4 
and  Run  7  respectively  which  do  not  consider  NRTS.  As  shown  in 
Appendix  B,  the  two  least  influential  variables  of  Run  3,  XT  and 
MPS  were  not  admitted  by  the  Cp-search  technique  when  NRTS  proved 
to  be  influential  in  Run  11.  Also  several  fits  (e.g.,  MHUNS) 
were  made  on  dependent  variables  (with  NRTS  as  an  independent 
variable),  which  should  not  be  a  function  of  NRTS.  In  those 
cases,  the  Cp-search  in  fact  did  not  accept  NRTS  as  being 
significant  enough  to  remain  in  the  relationships. 


(3)  Logistics  Engineering  and  Management.  B.  S.  Blanchard, 
Prentice-Hall,  Inc.,  (1974). 
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TABLE  8 

Independent 

Variables  Selected 

For 

Each 

FIT 

Independent 

Variables 

1  1 1  2 1 1 

RUN  NUMBER 

I  4  I  5  I  6  I  7  I  8 

u. 

lili 

11  1  12  1  13  14  1  15 

FIGSEN 

FIGCOM 

BOMVAV 

BOMSEN 

BONOOM 

CARNAV 

CARCOM 


TABLE  9 


Independent  Variables  (without  transformations)  Needed  For  Each  FIT 


Unfortunately,  since  the  number  of  relationships  considered  was 
large,  it  was  not  feasible  to  include  all  results  ("total 
picture")  for  each  fit  developed.  However,  all  results  are  in  an 
addendum  to  this  Volume  waich  is  retained  at  the  Air  Force 
Avionics  Laboratory  (AFAL) . 


SECTION  V 


RESULTS  OF  THE  VALIDATION  INVESTIGATION 


The  aspects  of  model  validity  are  discussed  in  Volume  I, 
including  replicative  and  predictive  validity.  Replicative 
validity  has  been  demonstrated  by  the  "goodness  of  fit" 
statistics  and  results  in  Section  IV.  The  strongest  measure  of 
validity,  the  demonstration  of  which  is  wanting  in  most 
estimating  models  (accounting,  subjective,  regression  "crystal 
ball",  etc.),  is  the  condition  in  which  the  model  is  predictively 
valid,  that  is,  when  it  can  match  real  system  data  before  the 
model  has  "seen"  the  data.  For  the  ALP OS  model,  this  is,  of 
course,  its  raison  d'etre.  In  other  words,  the  functional 
relationships  in  the  model  should  be  able  to  generate  as  outputs 
MTBFs ,  MTBMAs,  manhours,  etc.,  based  on  inputs  of  the  avionics 
LRU  physical  and  electrical  characteristics,  which  are  "similar" 
to  the  MTBFs,  MTBMAs,  manhours,  etc.  found  in  the  MDC,  IROS  and 
OSCER  data  system.  The  key  word  for  this  type  of  validity  is 
similar.  Thus,  in  order  to  investigate  predictive  validity,  it 
is  necessary  to  perform  some  type  of  data  collection  and  analysis 
required  in  establishing  the  data  base  for  the  regressions.  That 
is,  for  each  LRU  included  in  the  validation  analysis,  all  of  the 
independent  and  dependent  variables  required  as  input  to  exercise 
the  estimating  relationships  must  be  collected  or  derived.  There 
are  11  LRUs  (Appendix  C)  with  the  associated  independent  and 
dependent  variable  data,  across  aircraft  types,  avionics  areas 
and  levels,  of  complexity,  that  have  been  selected  for  this 
limited  investigation  of  validity  (See  Volume  I). 

Several  approaches  have  been  taken  with  the  intent  of 
demonstrating  predictive  validity  for  multiple  regression 
analysis  models,  including  running  a  totally  new  (validation) 
data  base  through  the  regression  relationships  obtained  via  the 
original  data  base,  prediction  intervals,  and  cross  verification 
of  coefficients  with  a  second  sample  of  data.  The  correct  way  to 
attempt  demonstrations  of  predictive  validity  for  regression 
models  is  that  of  developing  a  new  (validation)  data  base  (of 
“approximate"  size  and  complexity  as  that  of  the  original  data 
base)  and  run  this  validation  data  base  through  the  regression 
equations  (using  the  same  functional  form)  developed.  The 
analyst  must  then  evaluate  the  "total  picture"  including 
coefficients,  statistics,  statistical  plots  and  techniques  to 
determine  if  tnere  is  a  significant  difference  in  the  results 
using  the  validation  data  base.  Obviously  this  approach  is  out 
of  the  scope  of  this  particular  study  sine*  the  data  collection 
and  analysis  effort  comprises  a  large  portion  of  the  project 
tasks  costs  and  man-hours  for  model  development. 


A  prediction  interval  (sometimes  called  a  confidence  interval)  is 
an  interval  about  which  the  analyst  is  confident  (e.g.  95% 
confident)  that  the  estimated  value  of  the  dependent  variable 
(e.g.  MTBF ,  MHTOT,  etc.)  for  a  particular  observation  in  the 


validation  data  base  is  within  the  bounds  of  the  interval,  where 
the  lower  the  confidence  level  the  smaller  the  interval.  There 
are  two  major  drawbacks  in  demonstrating  predictive  validity  for 
multiple  regression  analysis  models  via  the  prediction  interval 
approach.  First,  the  prediction  interval  approach  depends 
heavily  upon  the  assumption  that  the  equation  obtained  through 
statistical  analyses  is  the  correct  form  of  the  equation. 

Although  many  types  of  analyses,  based  on  the  data  available, 
have  shown  statistically  significant  results  (as  demonstrated 
through  replicative  validity)  ,  it  is  not  definitely  known  that  the 
regression  equation  obtained  is  the  correct  form.  This  has  been 
demonstrated  many  times  in  the  physical  and  social  sciences, 
where  equations  that  were  used  for  many  years  were  updated  as  new 
data  and  information  became  available.  Secondly,  a  predictive 
interval  is  calculated  for  each  observation  in  the  validation 
data  base  one  at  a  time,  and  hence,  the  "total  picture"  of  the 
effects  that  the  validation  data  has  on  the  coefficient, 
statistics,  plots  and  tables  cannot  be  evaluated.  Prediction 
interval,  however,  has  its  merits  when  considering  much  less 
involved  statistical  methods  of  estimation  than  that  of  multiple 
regression  analysis. 

The  approach  used  in  this  study  for  attempting  the  demonstration 
of  predictive  validity  is  cross  verification  of  coefficients  with 
a  second  sample  of  data  where  the  validation  data  added  to  the 
original  regression  data  make  up  the  second  sample  of  data.  The 
coefficients  using  the  original  data  are  saved  and  can  be 
compared  with  the  coefficients  obtained  using  the  second  sample. 
In  addition,  the  "total  picture"  including  statistics,  plots, 
tables,  etc.  can  be  evaluated.  The  change  in  coefficients  could 
be  negligible,  a  100%  or  more  change  could  occur  and  a  change  in 
sign  of  the  coefficients  is  also  conceivable. 

A  cross  verification  of  coefficients  was  performed  on  each  of  the 

15  relationships  developed  using  the  11  observations  in  the 
validation  data  base  with  the  "total  picture"  of  the  results 
displayed  in  Appendix  D.  As  in  Appendix  B  space  did  not  permit 
all  plots  to  be  included.  The  MTBMA  (RUN20)  equation,  however 
has  all  statistics,  plots,  tables  and  techniques  included.  A 
summary  of  some  of  these  results  is  shown  in  Table  10,  including 
the  dependent  variable,  the  run  numbers  (for  identification 
purposes),  the  percent  change  in  coefficients,  the  multiple 
correlation  squared,  the  F-value,  the  observations  in  the 
validation  data  base  which  had  "large"  residuals,  the 
component-plus-residuals  plots,  the  number  of  variables  in  the 
basic  set  and  the  results  of  the  Cp-search  using  the  second 
sample  of  data.  For  instance,  the  results  for  LSCTO  (RMN60) 
showed  no  "significant"  change  in  the  values  of  the  coefficients, 
Ry2  -  .88,  F-value  ■  44.3.  There  were  no  observations  in  the 
validation  data  base  which  had  larger  residuals  than  the 
observations  used  to  develop  the  regression  equation;  there  were 
no  discrepancies  in  the  component-plus-residuals  plot;  there:  were 

16  variables  in  the  basic  set  of  variables  and  the  results  of  the 
Cp-search  agreed  with  the  original  results  in  that  there  is  no 
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subset  collection  of  variables  which  fits  the  data  with  less  total 
squared  erros  than  that  which  was  developed. 

The  results  of  this  limited  validation  investigation  were  quite 
encouraging.  The  multiple  correlation  coefficient  squared  fell  only 
slightly  in  each  case  whereas  the  F-values  increased  in  all  but  one 
case  { MTBF) .  There  were  no  major  discrepancies  in  the  component-plus- 
residuals  plots  for  all  15  equations.  The  basic  set  of  variables  was 
about  the  same  with  an  increase  in  some  cases  for  the  second  sample  of 
data.  The  results  of  the  C  -search  technique  were  the  same  in  6  of  the 
relationships.  In  4  of  theprelationships  the  C  -search  technique 
indicated  that  the  equations  developed  were  thepsecond  "best"  relation¬ 
ships,  but  in  all  cases  the  residual  mean  square  (a  measure  of  the 
error  of  prediction/  was  smallest  for  the  equations  developed.  There 
were  two  observations  in  the  validation  data  base  that  had  "larger" 
residuals  than  the  observations  used  to  develop  the  regression  relation¬ 
ships.  Observation  209  was  low  for  MTBF  (RUNOl)  and  high  for  MHSHO 
( RUN 50)  and  observation  210  was  high  for  all  relationships  fitted  for 
MHTOT  (RUN30,  RUN110,  RUN140).  Observations  209  (WUC  71CAO,  LORAN 
Receiver)  and  210  (WUC  71GAO,  Glide  Slope  Receiver)  are  both  navigations 
equipment  used  in  the  C-5A  aircraft.  These  two  observations  remained  in 
the  validation  data  base  since  investigation  into  each  of  the  data 
elements  (independent  and  dependent  variables)  did  not  indicate  errors 
in  the  collected  data. 

Of  the  15  equations  developed  for  the  ALPOS  model  there  were  only  5 
equations  (three  involving  MHTOT  and  two  involving  MHSHO)  where  the 
coefficient  of  only  one  variable  (DUP  in  each  case,  where  DUP  is  the 
unit  price  minus  its  d-statistic  squared)  changed  by  an  average  of  only 
about  50%.  The  three  equations  involving  MHTOT  were  those  developed 
without  NRTS  as  an  independent  variable,  with  NRTS  as  an  independent 
variable  and  the  results  of  the  C  -search  technique  to  find  the  second 
"best"  equation  for  MHTOT.  The  t8o  equations  involving  MHSHO  were  those 
with  and  without  NRTS  as  an  independent  variable.  This  change  in  the 
magnitude  of  the  regression  coefficient  for  the  variable  DUP  could  have 
been  caused  by  the  increase  in  the  range  of  DUP  which  was  a  result  of 
adding  the  validation  data  to  the  original  data.  The  limited  size  of  the 
validation  data  b’ise  could  have  also  caused  this  instability.  It  should 
be  noted,  however,  that  in  each  of  these  5  cases  where  the  coefficient  of 
DUP  changed  by  about  50%,  the  variable  DUP  was  among  the  three  least 
influential  variables  in  the  5  equations  developed  (see  Appendix  D) .  In 
the  cases  where  the  percent  change  in  the  coefficient  was  OK,  there  were 
only  two  equations  with  only  one  coefficient  in  each  case  which  changed 
by  about  30%,  but  in  most  of  the  remaining  equations  the  percent  change 
was  far  below  20%  in  each  coefficient  of  each  equation  developed.  It 
should  be  noted  here  that  each  of  the  coefficients  could  have  changed 
by  well  over  100%.  Moreover,  the  sign  of  each  coefficient  could  also 
have  changed  (i.e.,  from  a  positive  coefficient  to  a  negative  coefficient 
and  vice  versa).  In  these  extreme  cases,  where  the  stability  of  the 
equation  is  questionable,  there  would  have  been  cause  to  re-examine  the 
validation  data,  the  regression  analysis  data,  and  the  estimating 
relationships  developed. 


Although  the  results  of  this  limited  validation  investigation 
appear  significant,  it  would  be  appropriate  to  increase  the 
validation  sample  size  in  order  to  perform  a  more  thorough 
investigation  of  the  stability  of  the  relationships.  As  more 
data  becomes  available,  a  cross  verification  of  coefficients 
should  be  performed  and  results  of  the  change  in  coefficients, 
statistics,  and  plots  should  be  reviewed.  It  might  be  that  other 
independent  variables  not  considered  in  this  study  should  be 
included  or  other  forms  of  curvature  (component-plus-residuals 
plots)  of  those  variables  already  included  could  further 
demonstrate  predictive  validity  of  the  relationships. 


SECTION  VI 


CONCLUSIONS  AND  RECOMMENDATIONS 

The  cost  and  parametric  estimating  relationships  obtained  in  this 
study  were  put  through  critical  statistical  examinations  which 
covered  a  wide  range  of  possible  functional  forms  with  a  diverse 
spread  across  areas  of  avionics  and  aircraft  types.  It  should  be 
emphasized  that  the  author  is  not  implying  that  the  correct 
functional  forms  of  the  relationships  have  definitely  been 
obTained,  but  that  there  are  statistically  significant 
correlations  between  the  independent  and  dependent  variables 
considered  in  this  study  which  can  be  used  for  predicting 
Operations  and  Maintenance  costs  of  avionics  equipment  early  in 
the  preliminary/conceptual  design  phase.  Although  the  results, 
statistical  (section  IV)  and  validation  (section  V),  appear 
significant  there  are  still  areas  for  improverent  that  warrant 
futher  study  to  increase  the  prediction  capability  of  the 
equations  developed. 

The  first  recommendation  is  to  expand  the  data  base,  i.e., 
increase  the  number  of  observations  used  to  develop  the 
relationships,  consider  additional  independent  variables,  extend 
the  ranges  of  the  variables  and  expand  to  newer  technology  areas. 
Other  independent  variables  not  considered  in  this  study  such  as 
the  number  of  SRUs  in  an  LRU,  the  number  of  integrated  circuits 
(ICs)  in  an  LRU  and  any  other  complexity  factors  that  may  have  an 
influential  effect  on  the  dependent  variables  to  be  estimated, 
should  be  introduced  into  the  regressions  to  reduce  bias  and 
improve  the  prediction  capability  of  the  equations.  Although 
many  of  the  variables  considered  had  ranges  over  several  orders 
of  magnitude,  some  may  not  have  been  experienced  over  a  range 
adequate  enough  to  display  their  influence.  Extending  the  ranges 
of  the  variables  and  using  more  "state  of  the  art"  equipment  in 
the  regression  data  base  will  enhance  the  capability  of 
predicting  advanced  equipment  costs. 

The  second  recommendation  is  to  perform  a  cross  verification  of 
coefficients  where  those  aircraft  and  avionics  equipment  that 
have  been  in  the  Air  Force  inventory  long  enough  to  experience 
*hood"  data  (e.g.  F-16)  make  up  the  second  sample  of  data.  An 
evaluation  of  the  results  could  lead  to  an  update  of  the 
relationships  already  developed. 

The  third  recommendation  is  to  consider  more  transformations  of 
the  variables.  The  transformations  considered  covered  a  wide 
range  of  possible  forms,  but  there  are  many  other  transformations 
that  may  better  approximate  the  more  complicated  cases.  For 
instance,  some  of  the  independent  variables  were  percentages 
which  covered  a  wide  range  of  values.  The  Inverse  Sine 
transformation  can  be  used  to  weigh  more  heavily  the  small 
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percentages  which  have  small  variance.  In  addition, cross 
products  of  the  variables  can  also  be  considered  as  viable 
transformations.  Again  there  must  be  more  data  available  to  give 
the  analyst  the  flexibility  needed  to  consider  many  different 
functional  forms. 

The  fourth  recommendation  is  to  investigate  the  possibilities  of 
considering  Non-linear  Regression  Analysis  as  a  means  of 
determining  the  correct  functional  form  of  the  equations. 

Although  the  relationships  considered  in  this  study  covered  a 
wide  range  of  possible  functional  forms,  Non-linear  Regression 
Analyses  can  be  used  to  approximate  even  more  complicated  cases. 

An  area  not  touched  upon  in  this  study  is  that  of  error  in  the 
independent  variables  (Assumption  A3).  Some  notable 
contributions  on  the  subject  of  error  in  the  independent 
variables  have  been  made  for  the  case  of  one  independent  variable 
(See  Bibliography;  Acton,  Hocking  and  Leslie,  Mandansky) .  It 
appears,  however,  that  there  are  no  results  now  in  the 
statistical  literature  that  lend  to  practical  applications  when 
multiple  variables  are  considered. 

We  feel  that  this  study  has  significantly  contributed  to  the  art 
of  estimating  support  costs  of  avionics  in  the  conceptual  phase. 
The  use  of  existing  Air  Force  Data  Systems  has  been  demonstrated 
by  the  statistically  significant  correlations  which  exist  between 
the  variables  considered, yielding  relationships  which  can  be 
successfully  used  to  predict  future  costs.  The  Air  Force 
Avionics  Laboratory  has  been  provided  with  the  best  tool 
developed  to  date  to  predict  such  costs. 
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